The design of general purpose processors relies heavily on a workloadgathering step in which representative programs are collected from variousapplication domains. Processor performance, when running the workload set, isprofiled using simulators that model the targeted processor architecture.However, simulating the entire workload set is prohibitively time-consuming,which precludes considering a large number of programs. To reduce simulationtime, several techniques in the literature have exploited the internal programrepetitiveness to extract and execute only representative code segments.Existing so- lutions are based on reducing cross-program computationalredundancy or on eliminating internal-program redundancy to decrease executiontime. In this work, we propose an orthogonal and complementary loop- centricmethodology that targets loop-dominant programs by exploiting internal-programcharacteristics to reduce cross-program computational redundancy. The approachemploys a newly developed framework that extracts and analyzes core loopswithin workloads. The collected characteristics model memory behavior,computational complexity, and data structures of a program, and are used toconstruct a signature vector for each program. From these vectors,cross-workload similarity metrics are extracted, which are processed by a novelheuristic to exclude similar programs and reduce redundancy within the set.Finally, a reverse engineering approach that synthesizes executablemicro-benchmarks having the same instruction mix as the loops in the originalworkload is introduced. A tool that automates the flow steps of the proposedmethodology is developed. Simulation results demonstrate that applying theproposed methodology to a set of workloads reduces the set size by half, whilepreserving the main characterizations of the initial workloads.
展开▼